A TaLISMAN: Automatic Text and LIne Segmentation of historical MANuscripts
نویسندگان
چکیده
Historical and artistic handwritten books are valuable cultural heritage (CH) items, as they provide information about tangible and intangible cultural aspects from the past. Massive digitization projects have made these kind of data available to a world-wide population, and pose real challenges for automatic processing. In this scenario, document layout analysis plays a significant role, being a fundamental step of any document image understanding system. In this paper, we present a completely automatic algorithm to perform a robust text segmentation of old handwritten manuscripts on a per-book basis, and we show how to exploit this outcome to find two layout elements, i.e., text blocks and text lines. Our proposed technique have been evaluated on a large and heterogeneous corpus content, and our experimental results demonstrate that this approach is efficient and reliable, even when applied to very noisy and damaged books.
منابع مشابه
Margins are more important than text, Historical values of margins, memorial notes and colophons of Manuscripts in Zoroastrian tradition
In the Zoroastrian tradition, the most important challenge and the most ambiguous issue is ambiguity in history and neglect of time and chronology. Perhaps, this approach that historical time is limit and the begging and end of time is clear and the goodness will be conqueror eventually; it is because of ambiguity of history in Zoroastrian tradition.since early time to now, the Zoroastrian re...
متن کاملRadial Line Fourier Descriptor for Segmentation-free Handwritten Word Spotting
Automatic recognition of historical handwritten manuscripts is a daunting task due to paper degradation over time. Recognition-free retrieval or word spotting is popularly used for information retrieval and digitization of the historical handwritten documents. However, the performance of word spotting algorithms depends heavily on feature detection and representation methods. Although there exi...
متن کاملImage Segmentation of Historical Handwriting from Palm Leaf Manuscripts
Palm leaf manuscripts were one of the earliest forms of written media and were used in Southeast Asia to store early written knowledge about subjects such as medicine, Buddhist doctrine and astrology. Therefore, historical handwritten palm leaf manuscripts are important for people who like to learn about historical documents, because we can learn more experience from them. This paper presents a...
متن کاملRobust Line Detection in Historical Church Registers
For being able to automatically acquire information recorded in church registers and other historical scriptures, the text of such documents needs to be segmented prior to automatic reading. Segmentation of old handwritten scriptures is difficult for two main reasons. Lines of text in general are not straight and ascenders and descenders of adjacent lines interfere. The algorithms described in ...
متن کاملLine Detection and Segmentation in Historical Church Registers
For being able to automatically acquire the information recorded in church registers and other historical scriptures, the writing on these documents has to be recognized. This paper describes algorithms for transforming the paper documents into a representation of text apt to be used as input for an automatic text recognizer. The automatic recognition of old handwritten scriptures is difficult ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014